Coding Code: Investigating Student’s Data Science Skills with Qualitative Methods

Today’s layout


  1. (Briefly) Outline research on student learning through code
  2. Describe a framework for qualitatively analyzing student’s computing code
  3. Motivate how this framework could be used for learning trajectory research
  4. Set you free to perform a data analysis!


Investigating student learning through code

A computer screen with an image of source code

What research has been done?

A great deal of research has focused on what to teach in data science courses, but little focus on how students learn data science concepts.


Thus far we have detailed…

  • concepts or competencies that ought to be included in data science programs

  • perspectives on when to teach data science

  • how to teach data science concepts

  • methods for integrating data science into the classroom

  • assorted topics to be considered in data science courses

Drawing on research in Computer Science Education


An image of a bluebird.

An image of a student thinking about computer code.

The Importance of Students’ Attention to Program State (Lewis 2012)


  • Attends to both the code produced by a student and their learning process

  • Pairs a student’s code with their debugging behavior side-by-side

A student looking sad while typing on a computer, the image of a bug over the top of their code is previewed above their head.


These analyses of students’ code should not be few and far between. Students’ code poses a unique avenue for qualitative research in the teaching and learning of computing.

A framework for analyzing student’s code (Schulte 2008)

Text Surface Program Execution Function
Macrostructure Understanding the overall structure of the program Understanding the “algorithm” of the program Understanding the goal / purpose of the program (in its context)
Relations References between blocks, e.g., method calls, object creation Sequence of method calls, object sequence diagrams Understanding how sub-goals are related to goals, how function is achieved by subfunctions
Blocks Regions of interest (ROI) that syntactically or semantically build a unit Operation of a block, a method, or a ROI (as a sequence of statements) Function of a block, may be seen as a sub-goal
Atoms Language elements Operation of a statement Function of a statement, only understandable in context

Atom

with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))


Text Surface

How is whitespace being used?

Program Execution

What operation(s) does this statement carry out?

Function

How is this statement related to the broader context of the program?

Block

anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)  
summary(anterior)  
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))  
abline(anterior)  
plot(anterior)


Program Execution

What operation(s) does this block carry out?

Function

How is this block related to the broader context of the program?

Relationships Between Blocks


anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)  
summary(anterior)  
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))  
abline(anterior)  
plot(anterior)


posterior2 <- lm(ProximateAnalysisDataOutlier$PSUP ~ ProximateAnalysisDataOutlier$Lipid)
summary(posterior2)
with(ProximateAnalysisDataOutlier, plot(PSUP~Lipid, las=1, xlab = "Whole-body Lipid Content (%)", ylab = "UP Fatmeter Reading"))
abline(posterior2)
plot(posterior2)
posterior2

How can this be used for learning trajectory research?

Atom-level Analysis


“How does a student’s use of code comments (to structure their analysis) change over time?”

Block-level Analysis


“How does a student’s data analysis process change over time?”

Some tools to guide you


Descriptive coding


RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 1]


“Filters a vector of values using extraction operator, based on an equality relation with a variable selected from dataframe using $ operator”

Process coding

uses gerunds (“-ing” words) to connote action in the data (Saldana 2013)


anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)  
summary(anterior)  
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))  
abline(anterior)  
plot(anterior)


“Fitting a linear regression, inspecting regression summary, plotting scatterplot of variables in regression, adding a regression line to the plot, visualizing model diagnostics”

Let’s give it a try!

Why is this important for data science education?


How can we distinguish merely interesting learning from effective learning (Wiggins and McTighe 2005)?

Questions?

An image with two thought bubbles labeled question and answer.

Practical considerations

How much code should I collect?

  • Driven by the research question!
    • Amount of each student’s code
    • Number of students

How do readers trust my analysis?

  • Trust comes from:
  • confirmability
  • reliability
  • credibility
  • transferability


Excellent resources: Creswell & Poth (2018); Merriam & Tisdell (2016); Miles et al. (2020)

References

Corbin, Joseph, and Allan Strauss. 2008. Basics of qualitative research: Techniques and procedures for developing grounded theory. Thousand Oaks: Sage.
Creswell, J. W., and C. N. Poth. 2018. Qualitative Inquiry & Research Design. Thousand Oaks, CA: Sage.
Lewis, Colleen M. 2012. “The Importance of Students’ Attention to Program State.” Proceedings of the Ninth Annual International Conference on International Computing Education Research, September. https://doi.org/10.1145/2361276.2361301.
Merriam, S. B., and E. J. Tisdell. 2016. Qualitative Research. San Francisco, CA: John Wiley & Sons.
Miles, M. B., A. M. Huberman, and J. Saldaña. 2020. Qualitative Data Analysis. Thousand Oaks, CA: Sage.
Saldana, J. 2013. The Coding Maual for Qualitative Researchers. Thousand Oaks: Sage.
Schulte, Carsten. 2008. “Block Model.” Proceedings of the Fourth International Workshop on Computing Education Research, September. https://doi.org/10.1145/1404520.1404535.
Wiggins, G., and J. McTighe. 2005. Understanding by Design. 2nd ed. Alexandria: Association for Supervision; Curriculum Development (ASCD).